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ABSTRACT 

Visual  Understanding  is  an  increasing  field  of 
research  thanks  to  the  advances  in  image  processing, 
object  detection,  classification,  and  advanced 
computational  intelligence  techniques.  Hidden 
Markov  Models  (HMM)  are  one  of  these  techniques 
which  have  been  used  extensively  for  this  problem. 
This  paper  will  introduce  a  new  type  of  HMM,  called 
Evidence  Feed  Forward  Hidden  Markov  Models,  that 
not  only  increase  the  classification  rate  for  sparse 
messy  data,  but  outlines  a  whole  new  theory  towards 
changing  the  way  HMM’s  are  conceived.  Data  is 
taken  from  simulated  images  of  people’s  actions. 
Over  processing  is  performed  to  decrease  the 
likelihood  of  correct  classification.  Finally,  the  over¬ 
processed,  sparse  data  is  used  to  train  and  test  the 
Evidence  Feed-Forward  HMM  and  the  standard 
HMM.  Results  are  compared. 

1.  INTRODUCTION 

Visual  Understanding  (VU)  is  increasing  with 
the  growing  advances  in  technology  that  require  VU 
algorithms  to  be  taken  out  of  the  research  labs  and 
into  fully  developed  programs  and  systems.  A  sub 
research  area  of  VU  is  Visual  Human  Intent  Analysis 
(VHIA).  This  area  may  also  be  referred  to  as  visual 
human  behavior  identification,  action  or  activity 
recognition,  and  understanding  human  actions  from 
visual  cues.  In  static  self  security  systems  visual 
human  behavior  identification  systems  will  aide  or 
replace  security  guards  monitoring  CCTV  feeds. 
Television  stations  will  require  activity  recognition 
systems  to  automatically  categorize  and  store  or 
quickly  search  for  certain  scenes  in  a  database.  The 


U.S.  Army  is  pushing  robotics  to  replace  the  soldier, 
thus  requiring  the  need  to  understand  human  actions 
from  visual  cues  to  determine  hostile  actions  from 
people  so  the  robot  can  take  appropriate  actions  to 
secure  itself.  These  are  just  a  few  areas  where  VHIA 
will  increase  current  state  of  the  art  in  the 
development  and  use  of  future  systems. 

This  paper  concentrates  on  new  research  in  the 
area  of  Hidden  Markov  Models  (HMM)  to  the  extent 
of  redefining  the  way  HMMs  are  built.  Section  2  will 
give  a  background  of  recent  work  in  the  area  of 
Visual  Intent  Analysis  classification.  Section  3  will 
discuss  the  Evidence  Feed  Forward  Hidden  Markov 
Model  as  well  as  provide  an  example  to  help 
illustrate.  Section  4  will  give  the  equations  for 
solving  the  three  common  HMM  problems.  Section 
5  shows  the  results  of  the  Evidence  Feed-Forward 
HMM  on  a  problem  with  over-processed  data,  and 
section  6  summarize. 

2.  BACKROUND 

In  the  area  of  Visual  Intent  Analysis 
classification,  there  are  several  research  areas.  M. 
Cristani  et  al  [1]  uses  non-traditional  AI  methods  by 
taking  in  both  audio  and  visual  data  to  determine 
simple  events  in  an  office.  First  they  remove 
foreground  objects  and  segment  the  images  in  the 
sequence.  This  output  is  coupled  with  the  audio  data 
and  a  threshold  detection  process  is  used  to  identify 
unusual  events.  These  event  sequences  are  put  into 
an  audio  visual  concurrence  matrix  (AVC)  to 
compare  with  known  AVC  events. 
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Template  matching  is  performed  by  M. 
Dimitrijevic  et.  al.  [2],  They  developed  a  template 
database  of  actions  based  on  five  male  and  three 
female  people.  Each  human  action  is  represented  by 
three  frames  of  their  2D  silhouette  at  different  stages 
of  the  activity:  the  frame  when  the  person  first 
touches  the  ground  with  one  of  his/her  feet,  the  frame 
at  the  midstride  of  the  step,  and  the  end  frame  when 
the  person  finishes  touching  the  ground  with  the 
same  foot.  The  three  frame  sets  were  taken  from 
seven  camera  positions.  When  determining  the 
event,  they  use  a  modified  Chamfer’s  distance 
calculation  to  match  to  the  template  sequences  in  the 
database. 

Some  traditional  AI  methods  include  H.  Stern  et 
al.  [3]  who  created  a  prototype  fuzzy  system  for 
picture  understanding  of  surveillance  cameras.  His 
model  is  split  into  three  parts,  pre-processing  module, 
a  static  object  fuzzy  system  module,  and  a  dynamic 
temporal  fuzzy  system  module.  The  static  fuzzy 
system  module  classifies  pre-processed  data  as  a 
single  person,  two  people,  three  people,  many  people, 
or  no  people.  The  dynamic  fuzzy  system  determines 
the  intent  of  the  person  based  on  the  temporal 
movements. 

Another  common  approach  is  using  Grammars  to 
describe  the  sequence  of  movements  that  make  up  the 
action.  A.  Ogale  et.  al.  [4]  uses  probabilistic  context 
free  grammars  (PCFG)  in  short  action  sequences  of  a 
person  from  video.  Body  poses  are  stored  as 
silhouettes  which  are  used  in  the  construction  of  the 
PCFG.  Pairs  of  frames  are  constructed  based  on  their 
time  slot:  the  pose  from  frame  1  and  2  are  paired,  the 
pose  from  frame  2  and  3  are  paired,  and  so  on.  These 
pairs  construct  the  PCFG  for  the  given  action.  When 
testing  the  algorithm,  the  same  procedure  is  followed. 
Comparing  the  testing  data  with  the  trained  data  is 
accomplished  through  Bayes:  P(sklp;)  = 

P(Pilsk)P(Sk)/P(Pi),  where  sk  is  the  kth  silhouette  and  pt 
is  the  ith  pose. 

There  are  a  number  of  traditional  and  non- 
traditional  Hidden  Markov  Models  (HMM)  that  are 
used  in  trying  to  understand  peoples  actions  based  on 
visual  sequences.  A  few  include  Yamato  et.  al.  [5] 
used  HMMs  to  recognize  six  tennis  strokes  with  a 
25x25  mess  feature  matrix  to  describe  body  positions 


in  each  frame.  Wilson  and  Bobick  [6]  use  a 
Parametric  Hidden  Markov  Model  (PHMM)  to 
recognize  hand  gesture.  Oliver  et.  al.  [7]  developed  a 
method  to  detect  and  classify  interactions  between 
people  using  a  Coupled  Hidden  Markov  Model 
(CHMM)  based  on  simulations.  Multi-Observation 
Hidden  Markov  Models  (MOHMM)  are  discussed  in 
both  [8]  and  [9]  from  Xaing  and  Gong  for 
recognizing  break  points  in  video  content  for 
separation  of  activities  and  detect  piggybacking  of 
peoples  going  through  a  security  door,  respectively. 

3.  EVIDENCE  FEED-FORWARD  HIDDEN 
MARKOV  MODELS  INTRODUCTION 

Evidence  Feed-Forward  HMMs  are  HMMs  that 
involve  positive  feed-forward  from  the  current 
observation  nodes  into  the  nodes  of  the  future 
observation  in  the  Hidden  Markov  Model  (HMM). 
This  is  more  than  an  extension  to  HMMs  like 
Parametric  HMMS  or  Hierarchal  HMMs  because  it 
relaxes  the  need  for  complete  independence, 
disregards  the  rules  of  causality  as  suggested  by 
HMM  theory,  and  it  provides  a  link  from  evidence  to 
evidence  that  is  not  through  the  hidden  nodes,  which 
in  the  strict  sense,  Markov  models  current  state  only 
depends  on  the  previous  states  where  in  the  proposed 
new  model.  Evidence  Feed-Forward  HMM,  this  is  no 
longer  the  case.  However,  an  Evidence  Feed- 
Forward  HMM  can  still  be  classified  as  an  HMM 
since  there  is  a  hidden  layer,  a  network  of  choices, 
and  evidence  that  is  observed.  The  learning 
algorithms  and  the  applications  are  similar  to 
standard  HMMs.  The  difference  comes  in  the 
interpretation  of  how  a  process  should  model  a  real 
world  event. 

As  an  example,  take  the  commonly  used 
Weather  Example:  A  person  is  locked  inside  a 
windowless  building  and  would  like  to  know  whether 
it  is  raining  or  not  outside.  The  only  evidence  he  has 
is  whether  he  sees  his  boss  come  inside  with  or 
without  an  umbrella.  He  constructs  an  HMM  to 
make  his  decision.  Figure  1  shows  the  hidden  layer 
is  represented  by  the  blue  nodes  Rain  (R)  and  No 
Rain  (NR).  The  evidence  is  represented  by  the 
yellow  nodes  of  Umbrella  (U)  or  No  Umbrella  (NU). 
This  example  shows  that  the  evidence  (observation) 


2 


is  only  dependent  on  the  hidden  layer  and  not  vice 
versa.  Also,  the  hidden  layer  is  only  dependent  on 
the  previous  day’s  weather  (with  some  probability). 
However,  we  have  not  taken  into  account  the  effect 
of  the  evidence  affecting  the  next  day’s  evidence  and 
the  weather. 


Fig.  1:  Weather  Example  using  HMM. 


In  this  example,  suppose  the  boss  comes  into  the 
building  without  an  umbrella  and  it  is  raining.  Then, 
it  would  be  logical  to  assume  that  the  boss  would  be 
more  likely  to  carry  an  umbrella  the  next  day.  This 
changes  the  thinking  of  the  evidence  portion  of 
HMMs.  Previous  HMMs  assume  that  the  evidence  is 
based  only  on  the  current  node  (hidden)  that  you  are 
at,  so  seeing  there  was  an  umbrella  or  not  does  not 
have  any  effect  on  seeing  the  next  day  of  an  umbrella 
or  not.  However,  if  we  look  closer  at  this,  we  are 
looking  at  the  actions  of  the  boss  as  well,  so  there  is  a 
probability  associated  with  his  actions  (which  turn 
out  to  be  the  observations  in  this  HMM).  This  idea 
connects  the  evidence  of  each  event  to  the  evidence 
of  the  next  event. 

By  connecting  observations  to  observations,  the 
network  gets  very  complex.  However,  it  can  be 
simplified  by  assuming  that  the  probability  of  going 
to  a  future  observation  is  only  dependent  on  the 
probability  of  the  current  observation  and  the  current 
state  (hidden)  it  is  in.  Applying  this  to  the  example, 
the  probability  of  having  an  umbrella  given  it  rained 
the  previous  day  and  the  boss  did  not  have  his 
umbrella  is  the  same  without  needing  to  know  the 
current  days  weather.  I.e.  The  boss  will  increase  his 


likelihood  by  the  same  amount  of  carrying  his 
umbrella  whether  it  is  raining  or  not.  This  does  not 
mean  that  the  likelihood  of  the  boss  carrying  an 
umbrella  is  the  same,  only  the  increase  is  the  same. 
So,  if  the  likelihood  of  the  boss  carrying  an  umbrella 
is  very  high  compared  to  not  carrying  one  when  it  is 
raining,  then  this  increase  will  probably  not  have  a 
large  effect  on  the  outcome  of  the  boss  not  carrying 
an  umbrella  when  it  is  raining.  See  figure  2  for  a 
pictorial  view  of  this  example  using  Evidence  Feed- 
Forward  HMMs. 


Fig.  2:  Weather  example  using  Evidence  Feed- 
Forward  Hidden  Markov  Models. 

4.  EVIDENCE  FEED  FORWARD  HIDDEN 
MARKOV  MODELS  THEORY 

Just  like  standard  HMMs,  the  three  common 
problems  an  Evidence  Feed-Forward  HMMs  should 
solve  are: 

1.  Given  an  observation  sequence  O  = 
O1O2...OT  and  a  model  X  =  (A,B,C,7i), 
compute  the  probability  of  the  observation 
sequence  given  the  model  (P(0|k). 

2.  Given  the  observation  sequence  O  and  the 
model  X,  find  the  optimal  path  through  the 
hidden  state  sequence  Q  =  qiq2. .  .qT. 

3.  Given  a  number  of  observations,  learn  the 
optimal  values  of  the  parameters  of  X  = 
(A,B,C,ji)  to  maximize  P(0|k)  for  all  the 
observations. 
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For  a  detailed  tutorial  on  how  HMMs  solve  these 
problems  the  reader  is  referred  to  Rabiner  [10].  Here, 
the  model  parameters  are  as  follows:  A  is  a  2D 
matrix  holding  the  elements  aM=  Probability  of  going 
from  state  qt  =  S,  to  qt+1  =  S  ,  for  all  1<  i,j  <N,  N  is  the 
total  number  of  states;  B  is  a  2D  matrix  holding  the 
elements  bjk  =  probability  of  observation  O,  =  Vk 
given  you  are  in  state  j  and  0  <  k  <  M  (total  number 
of  possible  observations  is  M);  C  is  a  3D  matrix 
holding  Cj(h,k)  =  probability  of  observing  Ot+]  =  Vk 
given  we  are  in  state  qt  =  S„  observing  Ot  =  Vh;  n  is  a 
vector  of  71;  =  initial  probability  of  being  in  state  q  = 
S;. 

To  solve  the  first  problem,  we  develop  a 
forward  algorithm  procedure  to  compute  oti(t)  = 
p(0i,02,...0t,qt  =  i|T).  When  t  =  T,  P(0|X)  is  found 
by  summing  all  the  oij’s  at  time  T.  The  forward 
algorithm  procedure  is: 

1.  Oi(l)  =  Ttibi(Oi)  for  all  i,  0  <  i,  t  <  T,  and  b;(Oi)  = 
bih  for  some  h  which  Oi  =  Vh- 

2.  dj(t+l)  =  [Z?=i  ai(t)aijCi{Pt.Ot+i)]bj{Pt+i), 
where  Ci(Ot,Ot+i)  is  c;(h,k)  for  O,  =  Vh  and  Ot+i  = 
Vk  and  n  is  the  total  number  of  hidden  states. 

3.  p(0|X)=Sf=1ai.(T). 

The  final  probability  p(0|X)  is  the  probability  we 
are  looking  for. 

A  backwards  algorithm  procedure  can  also  be 
developed  to  find  P(0|X).  The  variable  Pi  must  be 
created  such  that  Pi(t)  =  p(0t+i,0t+2,...,0T  I  qt  =  i,  X). 

1.  Pi(T)  =  1 

2.  Pi(t)  -  [Zy=1  alJbj(Ot+1)  (jj(t  +  l)]Ci(0t,  0t+1) 

p(O|X)  =  5:jL1j?i(l)7rifti(01). 

It  should  be  noted  that  the  probability  of  the 
observations  given  the  model  using  both  the  forward 
and  backwards  algorithms  are  used  later  to  help  find 
answers  to  the  remaining  two  Evidence  Feed- 
Forward  HMM  problems. 


To  solve  the  second  problem,  computing  the 
optimal  path  of  hidden  states  from  the  observations, 
given  the  model,  one  must  make  use  of  both  the 
backwards  and  forwards  algorithm.  Optimal  path  is 
assumed  that  we  are  looking  for  the  path  that  gives 
the  maximum  probability  of  the  state  sequence  given 
the  observations  and  the  model.  We  are  maximizing 
P(Q|0,X).  To  do  this  we  create  two  new  variables,  8 
and  Path. 

1.  51(i)  =  n1bi(01).  Path=  []. 

2.  5t(j)  =  max1<i£n[(St_1aij)bj(Ot)ci(Ot_1,Ot)]. 
Path  is  state  which  this  is  maximized.  Add  the 
state  to  the  Path. 

3.  Final  step  is  finding  the  state  which  maximizes 
5T(i)  for  1  <  i  <  n. 

To  solve  the  final  problem,  we  use  the  Baum- 
Welch  algorithm  to  optimize  the  parameters.  First 
the  equation  is  separated  into  four  parts  and  a 
constraint  is  applied. 

ZjLiJrt  =  l 
17=1%  =  i. 

Ek=i  bJk  =  1  for  all  j, 
^=1Cl(h,k)=], 

Next,  create  the  variables  y;(t)  =  p(qt=i|0,X), 
the  probability  of  being  in  state  i  at  time  t  for 
sequence  O  and  model  X,  and  the  variable 
^ij(t)=P(qt=l,  qt+i  =  j)|0,2c),  the  probability  of  being  in 
state  i  at  time  t  and  state  j  at  time  t+1  given  the 
observations  and  the  model. 

Using  FaGrange  and  the  constraints  above, 
we  end  up  with  the  re-estimated  parameters  as: 

nl  =  expected  number  of  times  in  state  i  at  time  t=l, 

%  =  Yi(l). 
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atj  =  (expected  number  of  transitions  from  state  i  to 
state  j)  /  (expected  number  of  transitions  from  state  i) 
—  _ 

Uij  ~  Xt=iVi(t)  • 


Finally  the  bin  values  for  each  frame  was 
compared  to  its  next  bin  value  and  a  symbol 
associated  with  increase,  decrease,  or  stay  the  same 
was  used.  This  symbol  was  the  input  parameter  for 
both  the  Evidence  Feed-Forward  HMM  and  the 
standard  HMM. 


bik  =  (Expected  number  of  times  in  state  j  and 
observing  Ok  )  /  (expected  number  of  times  in  state  j) 
Zt=iY jW), where  0t=0k 


bik  =  : 


u=i  YjW 


Ci(h,  Ic)  =  (Expected  number  of  times  in  state  i 
observing  Oh  and  transitioning  observing  Ok  in  state  j 
for  all  j)  /  expected  number  of  times  in  state  i 
observing  Oh). 

c.(h  k)  _  £t~=i  Yi, where  0t  =  0h  and  Ot+i  =  Ofc 
1  '  Yl^Yi, where  0t=0h 

5.  RESULTS 

Data  pertaining  to  some  common  activities 
(jump,  jog,  dribble  a  basketball,  kick  a  soccer  ball) 
were  simulated.  The  data  was  over-processed  to 
reduce  most  all  common  features  that  would 
normally  be  used  to  detect  the  activity.  The  reason 
for  using  the  over-processed  data  is  to  show  that,  1.) 
the  Evidence  Feed-Forward  HMM  can  still  detect 
patterns  from  messy,  sparse  data;  and  2.)  The 
Evidence  Feed-Forward  HMM  would  obviously 
show  a  better  detection  rate  when  compared  with 
standard  HMMs. 

First,  the  simulated  images  were  extracted  from 
the  video,  ranging  from  an  activity  of  20  frames  to  an 
activity  of  over  100  frames.  Next,  each  frame  was 
processed  to  detect  the  hands,  feet,  and  head.  A 
single  point  represented  each  hand,  each  foot  and  the 
head.  Often  times  these  points  were  mis-represented. 
A  bounding  box  was  put  around  the  five  points  and 
the  height  of  the  bounding  box  was  divided  by  the 
width.  This  number  was  put  into  1  of  1 1  bins  equally 
divided  between  the  highest  and  lowest  value.  A 
graph  of  the  bin  values  for  each  frame  can  be  found 
in  figure  3. 


Fig.  3.  Processed  data  for  a  sequence  representing 
SOCCER  KICK.  The  x  values  represent  the  frame  in 
the  sequence.  The  y  values  represent  the  bin  the 
height/width  ratio  belongs  to. 


Both  HMMs  were  trained  with  only  four  activity 
sequences  for  the  JUMP,  JOG,  and  DRIBBLE 
activities.  The  SOCCER  KICK  activity  had  only 
three  sequences  trained.  Of  these  activity’s  a  testing 
set  was  used  that  did  not  include  the  training  set.  For 
the  results  for  the  JUMP  activity,  the  Evidence  Feed- 
Forward  HMM  correctly  classified  78%  where  the 
standard  HMM  classified  67%  correctly.  For  the 
SOCCER  KICK  activity,  the  Evidence  Feed-Forward 
HMM  classified  100%  correctly,  where  the  standard 
HMM  classified  50%  correct.  The  DRIBBLE 
activity  saw  results  of  12.5%  for  the  HMM  and  50% 
for  the  Evidence  Feed-Forward  HMM.  Finally,  the 
JOG  activity  did  not  fair  so  well  for  any  of  the 
classifiers.  The  HMM  classified  JOG  correctly  21% 
of  the  time.  For  the  Evidence  Feed-Forward  HMM, 
46%  of  the  sequences  were  correctly  classified. 
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This  paper  has  shown  the  idea  and  theory  behind 
the  Evidence  Feed  Forward  Hidden  Markov  Model. 


The  idea  behind  building  this  HMM  is  based  around 
the  assumption  of  the  observations  being  affected  by 
the  previous  observation.  This  is  not  the  case  in 
standard  HMMs.  Adding  this  probability  to  the 
classifier  improves  the  classification  rate  greatly  on 
messy,  sparse  data  sets,  as  shown  in  the  results 
section.  To  tackle  the  complex  problems  associated 
with  Visual  Understanding,  a  more  complex 
technique  needs  to  be  developed.  It  is  the  hope  of 
this  paper  to  convince  the  reader  that  the  Evidence 
Feed-Forward  HMM  is  one  such  technique.  Further 
studies  on  sparse,  messy  data  is  to  be  investigated  in 
the  near  future. 
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